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Abstract 

In this paper, we propose a capacity-achieving communication scheme for a Gaussian finite-state 
Markov channel (FSMC) with noiseless output feedback and with channel state information, subject to 
an average channel input power constraint. This scheme is derived from a control theoretic perspective 
■ and is based on the connections between feedback communication and feedback control over an FSMC. 

OA , It also considerably reduces encoder and decoder complexity; shortens coding delay; has a doubly 

bJO' exponential reliability function if the channel state behavior is "typical"; and nontrivially extends the 

^ ' Schalkwijk-Kailath coding scheme for an additive white Gaussian noise (AWGN) channel to an FSMC. 

< 

lO ' Index Terms 

Feedback communication, control-oriented feedback communication schemes, feedback capacity, 
feedback stabilization, finite-state Markov channels, channel state information 



C/3 



I. Introduction 

There have been many achievements in the study of Markov channels, in which the time-varying 
^ ■ fading gains (typically referred to as channel states) are modelled as Markov chains; see e.g. [l]-[8] and 
' references therein. [1] obtains the capacity of a finite-state Markov channel (FSMC) with independent and 
^ identically distributed (i.i.d.) inputs and with a known channel transition structure but without channel 
state information (CSI). [2] solves for the capacity of a Markov channel with instantaneous CSI at 
QQ . both the transmitter (or encoder) and receiver (or decoder), or at the receiver only. [3] investigates the 
O ' capacity of a Markov channel with possibly imprecise or delayed CSI. [4] provides the capacity of 
[ an FSMC with CSI delayed at the transmitter side and instantaneous at the receiver side (DTRCSI). 
^ I It also shows that, for a channel with DTRCSI, the access to the channel output by the transmitter 
' delayed feedback does not increase the capacity. [5] obtains the delay-constrained capacity for a 
flat block-fading channel with causal feedback. [8] studies the capacity of an FSMC with instantaneous, 



perfect CSI available at the receiver side at all time instants and at the transmitter side periodically every 
once in a fixed number of time instants. The Markov channels studied in above papers mainly focus on 
fading channels with white noise and without inter-symbol interference (ISI) when conditioned on CSI. 
More specifically, the channel state at each time is independent of the channel inputs up to that time if 
conditioned on past channel states, and the channel output at each time is independent of past channel 
inputs if conditioned on present channel state and input. Fading channels with ISI may also exhibit the 
Markov property and are closely related to the ISI-free Markov fading channels. For ISI channels with 
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output feedback, [6] characterizes the capacity and capacity-achieving distribution. [7] investigates the 
capacity of an FSMC with correlated inputs, with ISI, and with a known channel transition structure, but 
without CSI. Finally, we refer to a recent paper [9] for the study of capacities for several general classes 
of time-varying channels (block-memoryless, asymptotically block-memoryless, ISI, etc.) under causal 
CSI assumption (perfect or imperfect), as well as a list of references of the capacities for multi-input 
multi-output (MIMO) fading channels. 

In this paper, we present a capacity-achieving communication scheme for a single-input single-output 
(SISO) FSMC with additive white Gaussian noise (AWGN), under the assumption of delayed noiseless 
output feedback and DTRCSI, subject to an average channel input power constraint. We consider the 
case without ISI. Although the access to channel outputs by the transmitter cannot improve capacity in 
this scenario [4], we show that it leads to simpler encoders and decoders, shortens coding delays, and 
has a doubly exponential decay of the probability of decoding error, while achieving any rate below the 
feedback capacity. This scheme is among the first to achieve the feedback capacity of a fading channel 
with AWGN. 

Our scheme may be seen as a nontrivial extension of both the Schalkwijk-Kailath scheme (SK scheme) 
for the (non-fading) AWGN channel with output feedback [10], [11] and the optimal communication 
scheme over an FSMC without output feedback [2], [4]. In essence, our optimal feedback communication 
design for an FSMC consists of a set of decoupled subsystems running in parallel, and the subsystems are 
multiplexed to share the forward channel and feedback channel according to the channel state evolution. 
This provides the feedback communication system with the crucial ability to adapt its operations to the 
channel variation. Though the multiplexing idea was widely studied in communication without feedback, 
it has not received sufficient attention in the feedback communication literature. This is somewhat due to 
the fact that in feedback communication systems a considerably more complicated multiplexing design is 
often needed: While in communication without feedback, the subsystems are naturally decoupled from 
each other, in communication with delayed feedback, the decoupling of subsystems is possible only 
through appropriate designs. It turns out that, to ensure decoupling and the optimality in the feedback 
case, each subsystem needs to depend on an "augmented" channel state. In the case of m possible 
channel state values with one-step delayed output feedback, the transmitter needs to switch among m? 
possible sets of parameters, as opposed to the m required in the multiplexing design for communication 
without feedback ( [2], [4]). 

Our design simplifies when the channel states form an i.i.d. process taking a finite number of values, 
and requires only a simple first-order transmitter and receiver, without the needs of multiplexing or 
channel state augmentation. We also show that the design extends to the case when the channel states 
are i.i.d. taking an uncountably infinite number of values, including as special cases the widely used 
Rayleigh, Rician, Nakagami, and WeibuU fading channel models. 

Besides the novel scheme and its information theoretic properties, the other main contribution of this 
paper lies in the approach followed to derive the scheme, which is based on the equivalence between the 
feedback communication problem over an FSMC and a related feedback stabilization problem that has 
the FSMC in the loop. In particular, we show that the proposed communication scheme is associated 
with certain optimal stabilization problem of a Markov Jump Linear System (MILS) over the FSMC. 
In fact, the control system framework is quite simpler to understand and design, and from it, the much 
less intuitive feedback communication system can be simply derived. To be more specific, if the MJLS, 
unstable in open loop, is stabilized in closed loop, then the communication system can transmit at 
a signalling rate with asymptotically vanishing probability of error (i.e., stabilization implies reliable 
communication), and the supremum signalling rate is solely determined by how fast the state of the MJLS 
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grows in open loop LI . The transmission power in the communication system can also be determined from 
the MJLS by solving an optimal control problem called cheap control of the minimum variance linear 
quadratic Gaussian (LQG) problem. That is, the optimality in the communication problem coincides 
with that in the control problem. 

We refer to [13] for the study of MJLS; [12], [14]-[16], and Sec. |IIEA]and Sec.|IIEE]in the present 
paper for the cheap control, or the closely related minimum-energy control, over time-invariant channels; 
[12], [16]-[26] for some studies on the interactions between information and control. For completeness, 
we present a rather brief review of the interactions between information and control. [20] formulates the 
feedback communication problem as a stochastic optimal control problem, and provided a dynamical 
programming based solution. [23] shows the fundamental connections between the communication of 
non-stationary, non-ergodic sources and the stabilization of unstable systems. [12] establishes, over a 
time-invariant Gaussian channel, the equivalence of feedback communication and feedback stabilization 
problems, and that the optimality in the two problems coincides. The present paper is mainly along the 
line of [12] and is focused on the extension to time-varying Gaussian channels. 

We remark that the control-oriented feedback communication design approach, in the simplest case of 
AWGN channels, can give rise to an SK-type scheme. Compared with approaches attempting to directly 
extend the SK scheme, the control-oriented approach has been shown as a more systematic though 
simplified way in addressing a variety of feedback communication problems as the control systems 
associated with the communication schemes are usually easier to work with; see [27] for a survey. On 
the other hand, many feedback communication schemes have been derived by extending the SK scheme, 
more or less directly. For a partial list of the feedback communication designs based on the idea of 
SchaUcwijk and Kailath, see [12], [18], [23], [27]-[35] and therein references. 

This paper is organized as follows. Section |ll] introduces the channel model and the problem we want 
to solve. Section JII] describes the proposed feedback communication scheme. This scheme is shown 
to achieve any rate below the capacity in Section |IV] In Section |V] we present a numerical example. 
In Section |Vl] we conclude the paper. The presentation of the above results is focused on the case of 
one-step-delayed feedback; the extension to the case of multi-step-delayed feedback, having a much 
more complex form but bearing essentially the same idea of introducing appropriate multiplexing and 
augmented channel states, is skipped in this paper for brevity (see [27] for details). 

Notations: We represent time indices by subscripts, such as A^, to conform with the convention in 
dynamical systems, the time index starts from 0. We denote by A^' the sequence {An, An+i, • • • , Am}, 
and {Ak} the infinite sequence {^fc}^Q. We use boldface letter x for a vector, and x^^^ for the ith 
element of vector x. Note that ylj^ is a sequence, (An)"^ is the mth power of An, An is a vector with 
time index n, and A^n^^ is the mth element of vector An- We use a[l], a[2], • • • to represent a collection 
of fixed numbers. We denote "defined to be" as ":=". 

'We will show that the supreinum achievable signalling rate in the communication system equals the average open-loop 
growth rate in the MJLS. When the MJLS is open-loop unstable, the system state Xk grows as the time index k increases. 

The average open-loop growth rate may be defined as limfe^oo(l/A:) log | ri"Lj^ (a;^"' V^o'')l' where Xk has m dimensions and 

ii) ^ 
x): denotes the jth dimension. In the case of a scalar unstable LTl system defined as Xk+i = axk and yu ~ cxk with 

\a\ > 1, the growth rate becomes log|aj, which is equal to the degree of instability introduced in [12]. Note that in [12] the 

degree of instability for a multi-dimensional time-invariant system Xk+i ~ Axk is defined as log | \u,i{A)\, where \u,i{A) 

denotes an unstable eigenvalue of A\ the degree of instability is shown to equal the supremum achievable rate in the associated 

communication system in [12]. In this paper we will extend this notion to multi-dimensional time-varying systems. See Sec. 

Illl-Al and Sec. Illl-El for more discussions. 
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II. Channel Model 

Fig. [T] (a) shows the forward channel considered in this paper, that is, an FSMC with AWGN, or 
AFSMC for short. At time k, this discrete-time channel JT is described as 

yk = SkUk + Nk, for A; = 0,l,2,--- , (1) 

where Uk is the channel input, St is the channel state, Nk is the channel noise, and yk is the channel 
output. These variables are real- valued. The noise {iV^} is independent Gaussian with zero mean and 
a unit variance. The channel state Sk is independent of the channel inputs Uq and outputs yQ~^ when 
conditioned on the previous states Sq~^. Furthermore, {Sk} is a stationary, irreducible, aperiodic, finite- 
state homogeneous Markov chain and hence is ergodic. The one-step transition probability is 

Pij := Pr(5fc = s[i]|5fe_i = s[i]), for A: = 1, 2, • • • , 

where i, j = 1, 2, • • • , m; m is the number of possible state values of the Markov chain; and s[i] is a 
fixed number for each i. Assume that s[i] / s[j] if i / j. Note that s[i] denotes one of the m states 
of the Markov chain, and also represents the associated channel gain if the channel is in that state. 
Denote the stationary distribution vector of the Markov chain (which by ergodicity exists and is unique) 
as TT := [7r[l], 7r[2], • • • , 7r[m]]. 

Definition 1. i) Define the set of all possible channel state sequences {S^} as Q., i.e., 

n := {{Sk}} . (2) 

ii) Define the set of all possible channel state sequences Sq as il^, i.e., 

Qk := {S^} . (3) 

Hi) Define the (/c, /i)-typical set of sequences Sq as 

n{j,l,k) 



k + l 



< fi and 



n{j, k) 



Pjl 



< /i , (4) 



where j,l = 1,2, ■ ■ ■ ,m; fi > 0; and for a given channel state sequence Sq, n(j, I, k) is the number of 
transitions from channel state s[j] to channel state s[l] up to time k, and n{j, k) := Ylh=i "-(i' ^ □ 

Note that the typicality we defined above is in fact strong typicality [36]. By the Weak Law of Large 
Numbers or (strong) Asymptotic Equipartition Property (AFP), it holds that 



as k tends to infinity, and that 



PTiQk,^) ^ 1 (5) 
n{j, k) 



k + l 
n{j,l,k) 

n{j, k) 
n{j,l,k) 

k + l 



Pjl (6) 



Here — > specifies convergence in probability. Note that if Ofc — > a and bk — > b, then akbk — > ab and 



p 

/(cfc) f{a) for any continuous function /; see Lemma 3.3 (i.e. the Continuous Mapping Theorem) 
and Corollary 3.5 in [37]. 

^To simplify notations, we do not specify the dependency of n(j, I, k) and n{j, k) on the given sequence 5*0 here. If 5*0 is 
viewed as a realization, then n{j, I, k) and n{j, k) are not random variables, in which case one may use the notations Sq{u!o), 
n{j, I, k){ujo), and n{j, k){ujo) where loq is one fixed sample point in the sample space flk- If, however, Sq is viewed as a 
random vector, then n{j,l,k) and n{j,k) are to be viewed as random variables, in which case one may use the notations 
So{uj), n{j, I, k){uj), and n{j,k){uj) that are functions of lu defined on the sample space Qk- 
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We mainly focus on channel F with the following assumptions in this paper: 

Definition 2. Define an AFSMC J- with one-step-delayed output feedback and DTRCSI as channel F. 

In other words, we consider the case of one-step-delayed transmitter-side and instantaneous receiver- 
side CSI (DTRCSI). We also allow the transmitter to have access to the one-step-delayed channel output, 
i.e., the receiver at time k having observed will compute Vf^ (depending only on y^) and feed back 
along with 5^ to the transmitter with one step delay. In other words, the channel input U}^ can depend 



on 5'q ^ and 



fc-i 



See Fig. [U (b). 
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(a) 



transmitter 



Vk 



channel T 



(b) 



Fig. 1. (a) An AFSMC T. (b) A system over channel F. 



The capacity for channel F is characterized in [4] as 

C= max E5,_^^^,5,I(nfc;yfc|5fc_i,5fe), (7) 
Pr(«k|5fc_i) 

where Pr(ufc|5fc_i) is any input distribution subject to the average channel input power constraint 

Eu^ < V. (8) 

Note that also follows the stationary distribution tt since S^^i does. This optimization problem can 
be reduced to 

where r(-) is the power allocation function that maps the channel state Sk~i to the channel input power 
V{Sk~i)- The above capacity formula is obtained by invoking Lemma 2 of [4] (with d = 1 and = 1 
therein). The optimal power allocation, denoted 7(-), is given by the solution of a set of m equations]^ 
derived by applying the Kuhn-Tucker condition (see Appendix B in [4]) and is assumed given throughout 
this paper. The objective of the paper is to design a transmitter and receiver for channel F to achieve 
any rate below the capacity given in 

Note that the above defined channel has a discrete channel state but continuous channel input, noise, 
and output (in contrast to the discreteness of FSMC inputs and outputs assumed in [l]-[6], with the 
notable exception of some parts in [4]). This channel may be used in modelling the following cases 
and their generahzations. For one, a channel is subject to both erasure (i.e. discrete channel states) and 
AWGN (i.e. continuous noise), and the erasures may exhibit certain time correlation (e.g. forming a 
2-state Markov chain) as the causes of erasures may be time-correlated in some cases (such as buffer 
overflow). For another, a channel is subject to bursty noises with different noise variances, and the 
occurrence of bursty noises forms a finite-state Markov chain. The well-known Gilbert-Elliot channel 
with AWGN falls into this category. 

^The equations are not linear but the nonlinearity involves fractions only. Hence, they can be easily solved numerically. 
Besides, the decision variables r(s[i]), i = 1, ■ ■ ■ ,m, are inside a compact region and a number of numerical approaches, 
such as branching-and-bound, are available to improve the search efficiency. 
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We also remark that the assumption of the instantaneous, prefect CSI at the receiver side, though often 
assumed in the hterature (see [2], [4], [38], etc.), is not quite reaUstic (especially in the fast fading case); 
we adopt this assumption in order to simplify the analysis and to gain some conceptual understandings 
of the feedback communication problem over Markov channels. A study taking into consideration of 
the imperfect CSI will be subject to future work, and our present research based on perfect CSI may be 
found useful in that study. 

III. The proposed feedback communication scheme 

In this section, we propose a communication scheme for channel F. After a brief review of a feedback 
communication design for an AWGN channel and a description of the main idea of our design, we 
introduce the communication scheme and some of its properties. This scheme will be shown to achieve 
any rate below the feedback capacity in Section [TVl 

A. Review of an SK-type communication system for an AWGN channel with feedback 

To better explain the proposed scheme, we review a feedback communication scheme over an AWGN 
channel, see e.g. [10]-[12], [18], [39] for more details. Fig. [2] shows the communication system which 
can achieve any rate below the Shannon capacity of the AWGN channel. The coding process is as 
follows. Fix a coding length {K + 1), a power budget "P > 0, and any e > (where e is an arbitrarily 
small slack from the Shannon capacity). Define 

a ■= Vl + P, 6:= a--, c := 1, (10) 

a 

and 

M,^:=a(^+i)(i-^). (11) 

Equally partition the interval [— |, |] into Mk sub-intervals, and map the sub-interval centers to a set 
of Mk equally likely messages; this is known to both the transmitter and receiver a priori. Suppose 
that we wish to transmit the message represented by the center W. Let y_i := and x_i := W/a, 
i.e., Xq := W. In other words, the initial condition (at time 0) of the transmitter is the to-be-transmitted 
message. Generate £o,fc according to the following dynamics 

Xk = axk-i - byk-i 

Uk = CXk 

I AT vJ^^-* 

Vk = Uk + Nk 

xo,k = xo^k-i + a~^~'^hyk. 
We then decode, at the decoder side, by mapping xq^k into the closest sub-interval center. 

One can show that the probability of decoding error vanishes as k tends to infinity. In addition, the 
average input power E(iiQ''ug)/(fc + 1) converges to V. Since 

R:= lim -}—\ogMK = {l-e)\oga=^^\og{l + V), (13) 

A— >CXD A + i Z 

it holds that any rate below the Shannon capacity is achievable. 

We can also transmit a Gaussian random variable over the AWGN channel. Suppose xq ~ M{G, V) 
(i.e. x^i ~ J\f{0,V /a^)). Following the choice of parameters in (ITOl ) and the dynamics in ([T2l ). one 
obtains that 

MSE(xo,fc_i) := E(rco - xo,k-if = Va-^K (14) 

This is indeed the minimum mean-squared error (MSE) that we can achieve over this channel, since this 
channel can transmit at a rate no higher than log a, which corresponds to the (minimum) MSE distortion 
Va~'^^. In other words, xq is successively refined at the receiver at the Shannon capacity rate; see e.g. 
[18], [23], [40] for related study. 
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transmitter 



AWGN channel 



Vk 



XQ 



Vk-i 



a 



control setup 



Fig. 2. The communication system for an AWGN channel. X-\ = W/a, xo = W, io.o = 0, Nk ~ A/'(0, 1), a > 1, and 
> 0. The system inside the dotted box represents a closed-loop control system. 



The optimality in both the digital and analog communication problems is associated with the optimality 
in a control problem over the control setup as indicated in Fig. |2l Suppose in the control setup, for any 
fixed a > 1 and c, we wish to minimize the power of u by appropriately designing h (noticing that when 
the power of u is finite then the control setup is stabilized in closed loop) according to the solution to 
a classical control problem known as cheap control of minimum variance LQG problem ( [12], [14], 
[41], [42]). Precisely, the dynamics of the scalar LTI control system is 

Xk = axk-i - byk-i 

Uk = cxk (15) 

Vk = Uk + Nk 

where Xk is the system state with an unknown initial condition xq, Uk is the system output, yk is the 
measurement of the system output (corrupted by noise Nk) and the input to the controller, (—6) is the 
controller gain, and {—byk) is the controller's output (also known as the control input to the system 
that drives the system state). Here a and c are given, and we need to find {—b) to minimize the output 
power. That is, we need to solve the following control problem 

k 

min lim ——^y^{utf, (16) 

in which the control effort is free as there is no penalty on the controller's output {—byk)', hence the 
name "cheap control" 0. 

Note that Xk = a'^xo when the control system is in open loop (i.e. b = 0), and hence, we may define 
the growth rate for this unstable system as 

T-log— = loga. (17) 

K Xq 

''This cheap control problem may be reformulated as an expensive control problem (depending on whether one treats b or 
c as the controller). Treating u as the controller's output, c as the state-feedback controller gain to be designed, and b as 
given, one obtains an equivalent optimal control problem to minimize the power of the control effort (subject to closed-loop 
stability); hence the name "expensive control". The noiseless version of the expensive control is a minimum-energy control 
problem as the control energy in infinite horizon can be made finite. In both cheap control and expensive control (as well as the 
minimum-energy control), the optimal controllers place the closed-loop eigenvalues at the reciprocals of open-loop eigenvalues. 
In [12], the expensive control formulation (as opposed to cheap control) was under study when investigating the relationship 
between feedback control and feedback communication. 
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The optimal solution to the above control problem is to choose b as {a — I /a), which places the closed- 
loop eigenvalue at the reciprocal of the open-loop eigenvalue |^; the supremum communication rate is 
log a (cf. (fT3l)). equal to the growth rate of the unstable open-loop of the control setup, and the cheap 
control guarantees that the minimum transmission power is used to achieve any rate arbitrarily close 
to logo. These ideas were extended to feedback Gaussian channels with memory in [12], and will be 
extended to an AFSMC with feedback as we will show in this paper. 

We point out that our formulation of the communication scheme as described above differs from 
other popular SK-type feedback communication schemes in their original forms. We comment on the 
relationship and differences between these formulations; these comments also apply to the proposed 
scheme for an AFSMC and its variations (see Table IHl. First, the above-mentioned formulation does 
not involve unbounded coding parameters or unbounded signal power, whereas that in [10] involves 
exponentially growing bandwidth, [11] involves an exponentially growing parameter where a > 1 and 
k denotes the time index, and [12] generates a feedback signal with exponentially growing power, despite 
the facts that they all generate the same channel inputs, same outputs, and same decoded messages, 
and that one formulation can be obtained as a simple reformulation of others. Thus we consider our 
formulation more feasible (at least for simulation purposes, if not practically implementable). We also 
remark that our formulation is essentially the scheme studied by Gallager (p. 480, [39], which may 
need to receive more attention in the literature) with minor differences. In addition, our formulation 
differs from the original SK scheme in that, ours performs the same operation at every step, whereas the 
original SK formulation performs its startup operation different from later steps. Although ours has the 
advantage of unifying the operations for all steps (which simplifies the control-oriented analysis, also 
cf. Section IV in [12]), it has to remove a bias term using an extra equation when used to transmit digit 
messages (as done in [39]; see the equation mapping vn lo on p. 481) or wait long enough until 
that exponentially vanishing bias becomes negligible (see comments in Section IV of [12]). In contrast, 
the original SK scheme is unbiased since the special startup operation eliminates the bias. In this paper, 
we focus on our formulation and its extension since it corresponds to a control system that is easier to 
analyze. 

B. General description of the proposed scheme 

We present an informal overview of the proposed scheme before going into the technical details. 
In short, the proposed design can be viewed as a process of multiplexing a set of subsystems with 
feedback, each of them using an augmented channel state. In the degenerated case where the channel is 
time-invariant, the design can be simplified to the one described in the previous subsection for AWGN 
channels. 

Suppose that the Markov channel T' has m possible state values. Then our scheme consists of m 
subsystems (each of which is associated with one channel state value) sharing the forward channel 
J^. Represent each to-be-transmitted message as an m-dimensional codeword which contains m sub- 
codewords, and the m sub-codewords uniquely determine the message. Then assign the ith sub-codeword 
to be the initial condition of the ith subsystem (see Section IIII-HI and Fig. O. This completes the 
"encoding" process. 

^We provide a sketched proof here. When l ll5t is stabilized in closed loop, the power E(iife)^ convergences to b'^/(l— (a— 6)^). 
Then it is straightforward to compute that h — {a — 1/a) is the minimizer and the minimum power is {a^ — 1). 
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Then the system operates according to its designed dynamics (see ([T8] ) and Fig. |3]), in a way that the m 
subsystems are multiplexed and each of the m sub-codewords is communicated without the interference 
from other sub-codewords. To ensure this, at each time epoch, one and only one subsystem is selected 
to use the forward channel to send its sub-codeword, based on two immediate previous channel states 
(i.e. an augmented channel state; see the dependence on Sk~2 and S^-i at time k in Fig. |3]). The 
introduction of the augmented channel state leads to the decoupling of the m subsystems, which eases 
decoding without loss of optimality. Note that the decoupling is impossible (except for the degenerated 
case that {S^} is i.i.d.) if the transmitter only uses the immediate past channel state in the multiplexing. 
Note also that state augmentation is widely used for control systems with delay. See Section IIII-FI for 
the details regarding the decoupling of subsystems. 

To decode, each subsystem performs its decoding process independently. Then the original message 
would be correctly recovered if all m sub-codewords are correctly recovered. Therefore, the average 
rate of our scheme is the weighted sum of the individual rates for the subsystems, and the weights are 
the probabilities of the subsystems being selected to use the forward channel. Besides, the decoupling 
and the cheap control design (see Section IIII-Eb ensure that the channel input power at time k depends 
on the channel state at time {k — 1) only, and the power of the ith subsystem is designed to converge to 
7(s[i]), resulting in the average power converging to the weighted sum of 7(s[i]), which is the optimal 
power (see Sec. IIV-BI ). To summarize, each subsystem achieves a rate arbitrarily close to its capacity 
by applying the corresponding cheap control solution, and the overall scheme achieves a rate arbitrarily 
close to the capacity of channel F. 



C. Communication scheme 

Fig. [3] shows the proposed communication scheme. Its seemingly complicated and non-intuitive 
operations may be obtained easily as a simple reformulation of a control system; see Section IIII-EI 
for details. In this figure, we identify the transmitter, the channel !F, and the receiver in the dashed 
boxes. We call Xk G the transmitter state, with xq being the initial condition. We call Xq^^ £ I^"^ the 
receiver estimate, which is an estimate of xq at time k. Parameters A G ]^"^x'"^ ft g and c G M'" 
depend causally on the channel states, and will be chosen to reflect the adaptation of the communication 
strategy to the channel variation. 



transmitter 



Vk 



z L, 





- 








Vk-l 



.-1 



receiver 



n AiS^_„S^) \s,_„S,) 
j=0 



-1 











control setup 
Fig. 3. The communication scheme and the control setup. 

At time k, k > 0, the system generates signals according to the following dynamics in the listed 
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order: 

Xk = A{Sk~2,Sk^i)xk^i -b{Sk~2,Sk-i)yk-i 

uk = c{Sk-i)'xk 

Uk = SkUk + Nk f^Y?,) 

xo,k = xo,k~i + n^('^J'-i''^j)"^ KSk^i,Sk)yk, 
\i=o J 

where S'_2 := s[i\, S-i := s[l], y_i := 0, x_i := A{S-2-, S-i)^^Xo, and Xo,~i := 0. The above 
recursions will generate a sequence of receiver estimates {xo,/c} that converges to Xq, as we will prove 
in the next section. 



D. Choice of parameters 

Given any P > 0, we choose the parameters in the communication setup as follows. Let 7(-) be 
the optimal power allocation function computed from [4]. Supposing that Sk-2 = ^[j] for some j and 
'S'fc-i = s[r\ for some I, we define 

diag([l, • • • , l,a{Sk-2, Sk-i), 1, • • • , 1]) G K-x- 
[O,--- ,0,6(5fc„2,5fe-i),0,--- ,0]' (19) 
[O,--- ,0,c(5fc_i),0,--- ,0]' eM-, 

where a{Sk~2-, Sk~i) is the (j, i)th element of A{Sk-2, Sk-i), given by 

a{Sk-2,Sk-i) := \/(5fc-i)'7(5fc-2) + 1 ; (20) 
b{Sk-2,Sk-i) is the jth element of b{Sk-2, Sk-i), given by 

l{Sk~2)Sk-i 7(5fc_2)5fc_i 



V(5fc_i)27(5fe_2) + 1 a{Sk-2,Sk-i) 2n 
1 / , 1 ^ ^ ^ 

a{Sk-2, Sk-i] 



b{Sk-2, Sk^ 



^k~l \ a{bk~2iJk-l) , 

where the last equality is also true for the case Sk-i = if we treat 0/0 = 0; and c{Sk-i) is the /th 
element of c{Sk-i), given by 

c{Sk-i) := 1. (22) 

Whenever Sk, A: < 0, is encountered, it is treated as s[l]. Note that the above choice of A and b uses 
the augmented channel state {Sk-2, Sk-i) as we have mentioned in Section UlI-BI 
We can rewrite the dynamics of the transmitter state Xk in ([TSl l as 



Xk = Aci{Sk-2, Sk-i)xk-i - b{Sk-2, Sk-i)Nk-i, (23) 

where 

Aci{Sk-2, Sk-i) ■■= A{Sk-2, Sk-i) - Sk-ib{Sk-2, Sk-i)c{Sk-2y (24) 

is the closed-loop matrix for generating Xk- Let us assume Sk-2 = With the above choice of 
parameters, we then obtain that Aci{Sk-2, Sk-i) in (l24l) is a diagonal matrix whose (i,i) element is 1 
if i / j, and is 

0'{Sk-2, Sk-i) — Sk-ib{Sk-2, Sk-i)c{Sk-2) = a{Sk-2, Sk-i) (25) 
if i = j. Hence, we have 

(i) _ j a{Sk-2,Sk-i)~-^x^j^}_-^-b{Sk-2,Sk-i)Nk-i if i = j 
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or 



equivalently in matrix form (noticing that ^c/('S'fc-2; S^-i) = A{Sk-2, Sk-i) 



More explicitly, 



Xk 



1 

1 

••• 

••• 

••• 



(27) 













1 
1 



,(1) 



(m.) 



+ 





b{Sk~2, Sk-l) 









Nk- 



(28) 

This illustrates that, conditioned on the channel state sequence 5q, the evolution of the subsystem of 
x^*^ does not involve Xq ^ for any I ^ i. 
For future references, we define 

m 

■.= J{a{s[ji s[/])-[^>^ for i = 1, • • • ,m (29) 

and 



1=1 



(30) 



E. Cheap control 

Equations (l23l ) and ((24l) specify a control system, referred to as the control setup, in which we want 
to minimize the average power of u, namely the channel input power, by designing b for a given A 
and c. This is a cheap control problem over a Markov channel. The presence of cheap control in the 
optimal communication scheme is necessary because for any fixed, unstable matrix A{Sk-2, Sk-i), the 
supremum signalling rate is fixed under certain stability conditions. This follows that one needs to 
minimize the average channel input power by design, namely the average power of Uk = c{Sk-i)'xk, a 
cost quadratic in Xk, which is easily reformulated as a cheap control problem. The resulting (minimum) 
average power of Uk will be computed in Proposition |2] In this subsection, we briefly discuss the cheap 
control problem. 

In Fig. |4] we illustrate how we obtain the optimal communication scheme of Fig. [3] from the cheap 
control. By linearity, it holds that 

Xk=Xk+Xk, (31) 

where Xk is the zero-input response (generated purely from initial condition xq), and Xk is the zero-state 
response (generated purely from external input Uq); this is shown in the left part of Fig. |4](b). Note that 
—Xk can be alternatively constructed from as indicated in the right part of Fig. |4] (b). Since Xk is 
bounded due to the closed-loop stability and since Xk grows "exponentially" (in a time-varying manner), 
it holds that 

(32) 



Xk 



-Xk 



One can see that, when a subsystem is activated, the closed-loop eigenvalue of this subsystem is placed at the reciprocal of 
the open-loop eigenvalue, which resembles the cheap control design for a Gaussian channel without fading [14]. 
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for sufficiently large k. Therefore, without actually knowing Xq beforehand, the right part of Fig. |4] (b) 
can use —x^ to approximate Xk and hence reconstruct xq using Xo,fc := 11^=0 
provided that the knowledge of {Sk} is known 0. Then block diagram transformations (that shifts the 
one-step delay into the feedback link) lead to the proposed communication scheme in Fig. [3] which can 
be used to convey Xq- This is how the seemingly complicated communication scheme is obtained. 



"1 



'l(Sl:_l,Si,) 



-4 












-< 



















Vk 



A(St_i,Si,) 



(a) 



Ul. 



Vk 



-1. I 



A(St-i,St) 



(b) 



Fig. 4. (a) Cheap control of the MJLS. (b) Intermediate step towards the optimal communication scheme. 



Below we discuss some additional properties of the cheap control problem. Consider the cheap control 
of the MJLS shown in Fig. |4] (a). More specifically, the closed-loop control system is 

Xk+i = A{Sk-i,Sk)xk-byk 

Uk = c{Sk-i)'xk (33) 
Vk = SkUk + Nk, 

in which A{Sk~i, Sk) and c{Sk~i) are given for any k; yk is the noisy measurement of the system output 
and the input to the controller; b is the controller gain to be designed to ensure the minimum average 
power of u, with the discrete state Sq known to the controller at time k; and byk is the controller's output 
(also known as the control input to the system that drives the system state). Namely, we consider the 
MJLS with noisy observation yk-i and perfect knowledge of the Markov state Sq, and the associated 
optimal control problem is 



lim -l-EV(ut)2. (34) 



mm 
6 

^An alternative way to see the convergence of Xo,k to Xo is to combine Lemma [T] with, first, the stability of the system for 
Xk, and second, the fact that ^k is vanishing (Lemnia|2]i) and ii)). 
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Recall that A{Sk-2, S^-i) is a diagonal matrix, so if all of it diagonal elements, in absolute values, are 
no smaller than 1 and some of the diagonal elements are strictly greater than 1, then 03] ) is open-loop 
unstable. Similar to the AWGN unstable case, for any given initial condition xq, we may define the 
average growth rate of Xk in open loop (i.e., 6 = and hence Xk = A{Sk-2, Sk~i)xk~i) as 



1 in- 

lim — log — 

fc-+oo k I 

lim - log I n( 



^(i) I 



k 

lim — 

fc^oo k 



.(i)| 



m m 



2^ 2^ n{3, 1, k) log a{s\j\, s[l\) 



(35) 



i=i 1=1 



(a) 



(b) 



^ ^ Aj\Pji log a{s[3\,s[l]) 
j=i 1=1 

E5._.~,r,5._i log det{A{Sk-2, Sk^i)) 
= log a, 

where d is defined in (l30l) . (a) is to be interpreted as convergence in probability, (b) follows from the 
definition of A{Sk-2, Sk-i), and the last equality is shown in Appendix lAl In other words, the average 
growth rate of the state is equal to the expected degree of instability of A{Sk~2, Sk-i)- 

In the special case that Sk is constant throughout, (l34l ) and (|35] ) reduce to (fT6t and ([Tt] ). the counterparts 
for cheap control over an AWGN channel. If A and c are given according to (fT9l) . (l20l ). and (|22)) . then 
6 according to (fT9l ) and ((2TI) is the optimal choice. Instead, if the choice of A and c is nof given but a 
constraint of A is present, i.e., the average growth rate of the open-loop system is given, then it can be 
shown that A and c given in ([T9l ). (l20l ). and (l22l ) will lead to the minimum power, subject to the above 
constraint. Detailed computation is omitted here, but the validity will become clear when combining 
1) the relation between the control problem and the communication problem shown below and 2) the 
optimality in the communication problem established in the next section. 



F. Decoupling of states 

To capture the decoupling property presented in Sec. IIII-DI precisely, we provide a lemma after 
introducing some notations. 

Fix a channel state sequence Sq. If Sr-i = s[j] and Sr = s[l], we have 

^(5,_i,50-' =diag([l,--- ,l,a{s[j],s[l])-\l,--- , 



and we can then obtain 

k 

llA{Sr-i,Sr)-' = diag 



T=0 



diag 



-n(l,/,fc) 



:= ^k, 



,Y[a{s[m], s[l]y 



■n{m,l,k) 



1=1 



(36) 



(37) 



where and are defined in the obvious manner; recall Definition [T] for notation n{i,l,k). 
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Lemma 1. Fix any channel state sequence Sq and initial condition Xq. Then 

k 

T=0 

Xo,k = Xo- ^kXk+l 

k 



(38) 



•^O.k 



(1 



^0 + 



r=0 



.T=0 



where in the last equation, extracts the jth entry of the vector. 

Proof: The first equation is proved by recursively applying (l27l) and (|37] ). For the second equation, 
straightforward computation shows that 



k 

xo,k = ^ ^rb{ST-i,T)yT- 



T=0 



(39) 



r=0 



□ 



Finally, the third equation follows from the fact that {I — ^k)'^ is a diagonal matrix. 

From the last equation, we see that each sub-codeword Xq^ is transmitted independently from other 
sub-codewords. To help the reader better understand the decoupling of subsystems, let us call the 
subsystem associated with x^j^^ as the jth subsystem, denoted ^s[j]- If it holds that Sk = s[j], then 
we can also use the notation T^Sk (i-^- = ^s[j] if ^k = s[j]). The jth subsystem typically goes 
through the following cycle of operations: holding — updating receiver estimate (if the immediate past 
channel state was s[j]) — updating transmitter state (if the second immediate past channel state was s[j]) 
— holding. Consequently, any two subsystems that are not on hold do not perform the same updating 
operation at the same time. That is, our design ensures mutually exclusive updates among subsystems 
and hence the interaction-free evolution for subsystems. This simplifies the encoding/decoding processes. 

We point out that the decoupling is not possible if an augmented channel state is not used. See Fig. |5] 
for the schematic of the multiplexing scheme. At time k, if, for some t := t{k), the transmitter selects 
TiSt to use the forward link, then the receiver should select to receive to avoid interference. If, 
however, for some r := r(fc), the receiver selects T^s^ to use the feedback link, then the transmitter 
should select T,s^_i to receive to avoid interference. No matter how one may pick t and r, at least two 
channel states are needed. 



transmitter 



select 



5[1] 



select 



Vk 



select 



-St 



-s[l] 



select 



Fig. 5. The schematic of the multiplexing scheme. 

The proposed scheme is in fact such that t := t := k — 1. Therefore, at time k, the transmitter sends 
information for T^Sk^i, and receives for TiSk-2- The receiver receives information for T^Sk^i (as opposed 
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to Y^Sk' despite the fact that it knows Sk', this is because the transmitter can only send for TiSk^i), and 
sends for ^Sk-i- Again, the decoupling may be seen more easily by studying the control setup (e.g. ( [281) 
and Lemma [T]). 



G. Assumptions 

In what follows, the choice of parameters according to (fT9l)-(l22l) and Assumptions Al and A2 are 
assumed unless otherwise specified: 

Al: 7(s[j]) > for each j, i.e., each state s[j] is assigned with a nonzero power; 
A2: s[j] 7^ for each j, i.e., the channel contains no erasure. 

As a consequence of the assumptions, it holds that a(s[j],s[Z]) > 1 for each j and I, that a[j] > 1 
for each j, and that a > 1 (refer to ([201), and ([301) for the definitions). 

We adopt these assumptions only for convenience; our main results remain true even if the assumptions 
do not hold. In fact, when Al and A2 hold, we have a(s[j], s[l]) > 1 and that the mapping from x^^}_-^ 
to x^f^'^ in (l26l ) is always "strictly contractive", which simplifies the development of a convergence result 
(Lemma 121) that will be used to prove the main results. On the other hand, if Al or A2 does not hold, 
then when the subsystem assigned with zero power is activated, or when the channel state is an erasure, 
the receiver receives only an AWGN, and no information about the to-be-transmitted message flows 
across either the forward channel or the feedback channel. At such a moment, the transmitter state, 
receiver state, and the receiver estimate remain their values, namely, we have A{Sk~-2j S^^i) = I and 
b{Sk-2, Sk-i) = for some k and hence Xk = Xk~i, which is not "strictly contractive". This would 
require a couple of extra steps in establishing the (same) main results in a few places of our development. 
For convenience, we would like to develop the main results under Al and A2 in the main body of the 
paper, and defer the description of the extra steps until Appendix IB 

We also note that A2 can be directly verified from the given channel model, and Al can also be easily 
verified by 1) checking the optimal solution of 7(5 [i]) computed efficiently by a numerical solver; or 
2) applying the "complementary slackness" condition (an inequality constraint holds strictly if and only 
if its multiplier is zero, namely r(s[i]) > if and only if the ith multiplier is zero; cf. [43]) to the 
optimization problem, which has been widely used to determine if a constraint is active or not. 



H. Encoding/decoding method 
Encoding and decoding 

Fix the coding length to be [k + l), i.e., we use the channel from time to time k. We define B G M™ 
to be the unit hypercube centered at the origin, with each side (jth side denoted as B^^^) being the 
segment [— |, i]. For any fixed e > 0, and for each j, let B^^^ be uniformly partitioned into [A^^"'^J 
sub-intervals, where 

mI'^ := abf'^+i)^^-^) (40) 
and [Af J denotes the largest integer no greater than M. For each k and j, it holds that 

L^rJ = (41) 



for some S [1,2) (noting that Mj:' > 1). Now B is partitioned into 

■m 



Mk:=Y[lMl'>\ (42) 
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sub-hypercubes. Let the center of each sub-hypercube represent one of a set of Af^ equally likely 
messages. Call the sub-hypercube centers the codewords, the sub-interval centers the sub-codewords, 
and the set of codewords the codebook. 

For encoding, choose one codeword from the Mk centers, say W , and let xq := W. Then xq enters 
system (fTSl ) as the initial condition and generates channel input sequence Uq. For decoding, based on 
the channel output sequence i/q, the receiver calculates x^l, for each j, and then decides the sub-interval 
center closest to Xo,A: to be the codeword transmitted by the transmitter. The decision at time k is denoted 
as Wk- See Fig. |6]for a simple example of a codebook. 



S(2) 

msg[1] ~- 

msg[2] 

msg[3] 

msg[4] 

msg[5] 

msg[6] 



X 

cw[1] 


; 

X 1 X - 

cw[2] i cw[3] 


X 

cw[4] 


X ; X - 

cw[5] ; cw[6] 



a sub-interval of B^^^ 



Fig. 6. An example of codebook. Assume m = 2, 



3, and [Af^' 



(2) 



2, namely Mk 



The six messages 



are represented by six codewords, the centers of the sub-hypercubes. Suppose that message msg[l] is to be conveyed. Then 
codeword cw[l] is to be transmitted, and the sub-codewords are Kq^' = —1/3 and xl^^ — 1/4. The two sub-codewords are 
transmitted through two decoupled subsystems. At the receiver side, if both sub-codewords are correctly decoded, then the 
codeword cw[l] and hence message msg[l] can be correctly recovered. 



We see that the encoding and decoding are fairly simple. In fact, the computation complexity for 
the encoding and decoding mainly involves computing a product of m x m diagonal matrices (i.e. 
f^^'^Q A(5j_i, and grows linearly in {k + 1), where {k + 1) is the number of channel uses. 

Signalling rate 

We define the signalling rate as 



R 



lim — 

fc^oo k ■ 



1 



-| IIL 

lim yioglMp'^l 



(43) 



if the limit exists. 
Probability of error 

We declare a decoding error if the decoder's decision is not equal to the transmitted codeword 
W. To compute the probability of error PE/^, we first compute the probability of error for the jth 
sub-codeword conditioned on the channel state sequence 



PeI^I := Pr (xqI and x^^ are in different sub-intervals of B^^^\Sq 



^k\S 



(44) 



Since conditioned on Sq, x*-'^ and x^^'^ evolve independently, we can independently compute PE^g for 
each j (see Lemma [T] and Lemma |2] iii)). We then have that, PE^g, the probability of error for the 
codeword W conditioned on the channel state sequence 5q, is given by 



PE, 



k\S 



1 



PE 



k\sJ 



(45) 
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Finally, the probability of error for decoding W, which averages PE^g over all possible channel state 
sequences Sq, is 

PEk:= PEk\sPT{S^y, (46) 

recall that 17^ is defined as the set of all possible channel state sequences of length {k + 1). We remark 
that, though the above definitions are for some fixed Xq, since the probability of error for different Xq 
shares the same asymptotic behavior [§, it is sufficient to study the probability of error for one fixed 
initial condition Xq. 
Power constraint and achievable rate 

For any fixed k and fixed channel state sequence ^q^^, denote the channel inputs generated at time 
t as u^is'^-i or simply Mt|5, t = 0, 1, • • • , fc, to emphasize the dependence on a specific channel state 
sequence (noticing that ut does not specify how it depends on St). Then the average transmission power 
for this channel state sequence is 

1 ^ 

P''\s--=^T.^(^t\s?- (47) 

t=o 

This corresponds to the average power for the transmission of one message for one possible channel 
state sequence. Then the transmission power averaged over all channel state sequences Sq G ^k-i is 

P,:= Yl PklsP^iSf)- (48) 

We say that the power constraint is satisfied if 

limsupPfc<P (49) 

for a given V > 0. 

We call a signalling rate R, defined in (l43l) . achievable if, as k tends to infinity, PE^ decays to zero 
and the power constraint ( |49l ) is satisfied. 

To conclude this section, we present Table U comparing the proposed scheme with other related com- 
munication schemes. In this table, a, b, and c are chosen according to (fTOt . TRCSI denotes instantaneous, 
perfect transmitter-side and receiver-side CSI. Note that Ella's scheme in [12], Goldsmith and Varaiya's 
schemes in [2], and Viswanathan's schemes in [4] can be used for more general setups that are not 
consider in this table or in this paper. Also note that the original SK scheme's messages are chosen 
from the sub-interval center of interval [0, 1], as opposed to interval [—0.5, 0.5]. 



*In fact for k sufficiently large, Xo,k has the form Xo.k = Xq + Ak (see ( I38H . where Afe does not depend on xo- Therefore, 
asymptotically the decoding error does not depend on Xq. 
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TABLE I 

Comparison of the Proposed Scheme with Other Communication Schemes 





Channel Model 


Capacity 


Tx Operations 


Proposed 


Vh = Shllh ~\- Nh 

DTRCSI, output feedback 


^ESk-^~^,S, log(l + {Sk)HSk-l)) 


Xk = A{Sk.~2, Sk~i)xk~i — b{Sk-2, Sk-i)yk-i 
111- = ciSh 1 Y Xi- 

update the transmitter state associated with Sk-2', 
transmit the transmitter state associated with Sk-i 


SK's 


yk =uk + Nk 
output feedback 


ilog(l+P) 


Uo = a(0.5 — a;o) 

fc > : Uk = a'°(io,fe-i ~ Xq) 


Gallager's 


yk =Uk+ Nk 
output feedback 


ilog(l+P) 


Xk = a{xk-i - Vk-i) 

Uk = Xk 


Elia's 


yk = Uk + Nk 
output feedback 


ilog(l+P) 


Xk = axk-i 

Uk = CXk - Vk-l 


Goldsmith 
& Varaiya's 


yk = SkUk + Nk 
TRCSI 


iEs,..log(l + (5^)^7(5..)) 


multiplexing according to Sk 


Viswanathan's 


yk = SkUk + Nk 
DTRCSI, with/without output feedback 


iEs,_,..,s, log(l + {Skf^iSk-i)) 


multiplexing according to Sk-i 





Rx Operations 


Feedback Signal 
Available at Tx 


Advantages 


Disadvantages 


Proposed 


k 

xo,k = xo.k^i + Y\_ Sjy^b{Sk-i, Sk)yk 
demultiplex according to Sk-i 


J/fe-i, •S'fe-i 


Bounded signals/parameters; unified 
operations for all k; control-oriented 
analysis readily applicable; same 
operations for digital/analog 
transmissions 


Rx estimate of xo is 
biased (asym. unbiased) 


SK's 


xo,o = 0.5 — a^-^yo 

k> 0: io,fc = xo,k-i - a^'^^^Va^ - lyfe 


Xo,k-\ 


unbiased Rx estimate of xo; same 
operations for digital/analog 
transmissions 


unbounded parameter a*^; initial 
operation differs from later ones 


Gallager's 


Xk = Xk~i + a^'^^^hyk 
xo^k^il-a-^^-^y^Xk 


Vk-i = a^^byk-i 


Bounded signals/parameters; unified 
operations for all fc; unbiased Rx 
estimate of xo 


different operations for digital 
and analog transmissions 


Elia's 


Xk = axk-i + hyk 
xo,k = a~''xk 


Vk-l = CXk-l 


unified operations for all k; control- 
oriented analysis readily applicable; 
same operations for 
digital/analog transmissions 


unbounded feedback power; 
Rx estimate of xo is 
biased (asym. unbiased) 


Goldsmith 
& Varaiya's 


demultiplex according to Sk 


Sk 






Viswanathan's 


demultiplex according to Sk-i 


Sk-i (yk-i) 




output feedback not 
fully explored 



IV. Achieving capacity of channel F 

In this section, we show that the feedback communication scheme proposed in Section |lll] along with 
the parameters given by (fT9l)- (l22] ) is capacity-achieving. Our main result is summarized in Theorem 1. 

Theorem 1. Suppose that for channel F, the channel state St forms an ergodic Markov process and is 
available instantaneously to the receiver and with one step delay to the transmitter. Given any V > 0, 
let 'y{-) be the capacity-achieving power allocation that maps the channel state Sk to the channel input 
power "fiSk) and such that £5^^7(5) < V holds. Then, the feedback communication scheme described 
in Section |^ along with the parameters given by i[19hil22i under Assumptions AI and A2, achieves 
any rate arbitrarily close to the feedback capacity 

C = ^E5,^.,s.+, log(l + {Sk+if^{Sk)) = log a (50) 
under average input power constraint 

En^ < V. (51) 

To prove this theorem, we first compute the achievable rate, followed by showing that the power 
constraint (|49l ) is satisfied. See Appendix |A] for the proof of C = log a. 

We note that in [4], the capacity for Gaussian channels as given in dSOl ) is only in terms of the 
maximum mutual information. Whether it is equal to the operational capacity is not explicitly proven in 
[4]. ( [4] proved the achievability and converse for discrete FSMCs but not for Gaussian FSMCs.) To be 
complete, we provide the converse in Appendix iBl Combining the converse and the achievability shown 
in Theorem [T] we establish that the maximum mutual information is indeed equal to the operational 
capacity. 

A. Achievable rate 

In this part, we prove that for any e > small enough, rate (1 — e)C is achievable. This development 
is facilitated by considering the control setup. In fact, for an AFSMC, whenever the control setup in Fig. 
|4] (a) is open-loop unstable but closed-loop stabilized, the communication system in Fig. [3] can achieve 
any rate smaller than the growth rate of the open-loop control setup, similar to the case of Gaussian 
channels without time-selective fading [12]. Thus, our first step is to establish relevant stability results 
for the control setup. 

Our choice of parameters in Section ITlI-DI leads to a conclusion that the open loop of the control setup 
is unstable. The average rate of growth for this unstable system is log a. This is exactly the supremum 
achievable rate for the proposed communication system. Next, we show that the control setup is stabilized 
in closed-loop, based on which we prove the signalling rate is achievable. Note that the stability of the 
control setup is in the sense of, firstly, the boundedness and convergence of the first moment Ex^ 
to zero, and secondly, the boundedness of the variance-covariance matrix := Exfcz'^ — ExfeEx'^, 
both conditioned on a given channel state sequence {Sk} and initial condition Xq. (Note that when 
conditioned on {Sk\ and Xq, the randomness in Xk comes from AWGN A'^^"^.) Finally, note that if one 
fixes the choice of A{Sk, Sk+i), then the open-loop growth rate as well as the supremum signalling rate 
is fixed, and thus b{Sk, S^+i) can be chosen to ensure the stability of the closed-loop as well as the 
minimum average input power (in order to achieve the most power-efficient communication), which is 
exactly a cheap control problem. 

Lemma 2. Assume the hypotheses of Theorem\l\ and fix a channel state sequence {S^} in VL. Then for 
the control setup f |2JD , 
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/) is the state transition matrix, namely the response due to initial condition Xq is x^j^i = ^k^Q. 
For every j = 1, • • • , m, it holds that < < 1 for any k and 

^ (52) 

for any j; 

ii) For any fixed initial condition xq, it holds that for any j, —\xq \ < Ex^ < \xq \ for any k and 
that Exjf^ ^ 0; 

Hi) For any xq, := E(a;fc — Exfc)(xfc — Ex^)' is a diagonal matrix, i.e., the components of x^ are 

U) (j) (i) (j) 

mutually independent, i.e. Ex^ x): = Ex^ Ex^ if i ^ j, and there exist c and c such that 

< c < Sjf'^ < c < oo (53) 
for any k, j, and {Sk}, where S^"'^ is the {j,j)th element ofT,^. 

Proof: See Appendix O □ 
Now we can use the stability of the control setup to establish the reliable communication across the 
channel. Note that the stability of the control setup implies that any signal in the communication scheme 
is bounded, as claimed in Section J] 



Remark 1. The intuition behind the following development is that, in view of the stability shown in 

and Xq ^ 



Lemma |2l the difference between Xq'^ and Xq-'| vanishes sufficiently fast in both the first and second 

' (i) 
moments as time increases thanks to closed-loop stability conditions, and so does PE)^.' the probability 



that Xq and Xq ^ are in different sub-intervals, if the channel state sequence is typical. Then because 
of the decoupling proven in Lemma [T] and Lemma |2l PE^g can be shown to be vanishing. 

Proposition 1. Assume the hypotheses of Theorem\l} Then the communication system reliably transmits 
at rate 

ii = (1 - e) log a = (1 - e)C, (54) 

for any given e > 0. 

Proof: To establish the achievable rate, we first compute the signalling rate, followed by proving 
that the average probability of error PE^ goes to zero as k goes to infinity, which implies that the 
signalling rate is achievable. 



Signalling rate 

It holds that 



fc-»oo k + 1 



k^oD \ k + 1 k + 1 j 

m Er=iiog«bf'^'^^'"^^ (55) 

= hm — ; 

fe^oo k + 1 

m 

= (l-e)5]loga[j] 
= (l-e)loga. 



Probability of error 
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The proof of vanishing probability of error is essentially an AEP-based argument (of. [36]). Fix e > 0. 
Fix > 0. Recall the definition in ^ for the {k, ;u)-typical set Qk,ti where ^Li > is to be determined 
as a function of e but independent of k. Then is partitioned into two subsets: and 0^ ^, where 
the superscript denotes the complement in Q^- Our goal is to show that the probability of error PE^g 
satisfies 

PEk\s < ti{k), V5o^ G Qk,^. (56) 

for some vanishing function k(-) > which is independent of k (i.e., k(-) is a time-invariant function). 
This would lead to vanishing probability of error PEi^. More precisely, note that 



PEk := PEk\sPr{S^) 



PE.^sMSt, 



(57) 



< 



< K{k)+Fr{ni^^), 

where we have used k{-) > 0, (l56l ). J2s''€nk ^^('^o) — ^' ^^'^ PEk\s ^ 1- Therefore, as k increases, 
the average probability of error PEk would decay to zero. 

Now we start by investigating the behavior of -P-E'^'j^ on Q-k^^- Fix any channel state sequence Sq 
and initial condition xq. Because the noise is zero mean i.i.d. Gaussian, by Lemma [T] Xfe+i is Gaussian 
with distribution Mi^Xk+i^^k+i)^ if conditioned on Sq and Xq, where S^+i is defined in Lemma |2] 
iii). Then notice that Ex^+i = ^k^o- Therefore, Xo,fe is an m-variate Gaussian vector distributed as 

M{xo-{^kfxo,{^kf^k+i), (58) 

(i) 

and particularly, for each j, ^ is a univariate Gaussian distributed as 



' 



(59) 



We now assume without loss of generality that x^^ is the center of the ith sub-interval of B^^\ See 
Fig. |7] We can thus derive the following expression of PE^^j^, which by Lemma [T] and Lemma |2] iii) 
can be computed independently for each j: 



PE 



k\S 



Q 
Q 
Q 




O.5/LM»>J+4^'(0»))^ 



O.5/LM^/'j-4^'(0i^')^ \ 



(60) 



+ Q 



To see the convergence to zero of PE^^)^, note that by the uniform boundedness of G [1)2), 



G (-0.5,0.5), (j)]^'^ G (0, 1], and S^L, it is sufficient to show that a[j](''+^)(^~^Vfc^ goes to zero. 



One can show that 



(fc+l){l-e)^(j-) 



exp 



exp 



exp 



(fc + l) j^(l-e) log a[i] + 

m / 

(/c+l)E((l-e)vr[ib,; 



1 



n{j,l,k) 



k + l 



loga{s[j],s[ 



(61) 



{k + l)J2d{j,l,k) loga{s\jis[ 



1=1 
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Fig. 7. The location of a::^"''' in a sub-interval of Z?'-'' and the distribution of x\p^, with mean \ — (<t>£')'^x^Q^j and 
variance (^f^^^. 



where 

d(i,/,fc):=(l-6)vr[ib,,-^^^l^ (62) 

If = (noting that 7r[j] 7^ by ergodicity), then d(j,l,k) < for any 5q in 0^^. If, however, 
Pj/ 7^ 0, then let 

n{j,l,k) ^ ]^ _^ ;^ ii 

(fc + l)7r[j] + \[jy 

where Ai and A2 are such that |Ai| < 1 and IA2I < 1 by (ID). Then it holds that 



n{j,l,k) n{j,k) _ ^ / A]_ A2_ /UA1A2 



n'{j,k)pji{k+l)TT[j] \pji 7r[j] PjiTT[j] 

(pji ^ 7r[j] ^ Pji7r[j] ) 



(64) 



Therefore, for sufficiently small /^i > (more specifically, fii := ^i(e,j, /) depends on e, j, and I but 
independent of k), d{j, I, k) is non-positive for any in ^lk,^li_■ Furthermore, as the choices of j and / 
are finite, there exists a sufficiently small /i2 > (/X2 := /U2(e) depends on e but independent of k) such 
that c?(j, k) is non-positive for any 5*0 in ^k,p.2- 

Let a := min^ a(s[j], > 1 and a := a*^ > 1. It then follows from ( [6T| ). ( [62l ). and d{j,l,k) < 
that 



—en 

a 



:= a 

where we have defined 



r^i^) := nij, k) - {k + 1) Q - l) (vrb'] - ^) . (66) 



Therefore, 
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where 

C?^ := log. — > . (67) 

Similar to the way that we chose we may choose ^3 := /X3(e) sufficiently small such that for any 
Sq G fj.3 and for any j and k, it holds that 

in fact, if 

vrb1 - ^ > cTi (69) 



holds for some ^3 > and ui > and for all j = 1, 2, • • • , m, then (1681 ) is true. Here ui depends on e 
but does not depend on k. Note that such and cri exist since m is finite. Moreover, by (^^^ G [1)2), 
e (-0.5,0.5), G (0,1], and G (c, c) for any j, k, and Sq (by Lemma |2l), one can easily 

show that (k is uniformly bounded, namely (k S (C^^, Ci) where the bounds are independent of j, k, Xq, 
and 

Now pick /i4 > sufficiently small such that ^4 < min{/i2, /^s}- Consequently, for any Sq G ^^^,^4, 
it holds that 



1 



r -1 k) 



k+1 



< ( ^ - 1 ) M4. (70) 



This yields that. 



= (A; + l)f7r[i]+A^4- [^-lj/i4 



(71) 



> (A: + l)(7r[i] ^''^ 

> {k + l)a, 

where |A| < 1 and fi > 0. Thus, {rj^ + Cfc) goes to infinity and Q^ i vanishes. More precisely, 

< Q(«^''"'^"^^0- (72) 

Similarly, 

Qk,2 := Q 



(73) 



for any Sq G ^k,fi5 and for some where /X5 > 0. 

Letting jj.- = min{/U4,/i5} and C := min{C_|^, C^}' we have that 

PE^il = Qk,i + Qk,2 < 2Q(a('^+^)'^+^), (74) 

which decays to zero as k tends to infinity for any 5q G ^k,^i- Then invoking the union bound 

m m 

PE^\s = 1 - 11(1 - P^j) ) < < 2mQ(a('=+i)'^+^), (75) 

i=i i=i 

we conclude that PEp,\g would converge to zero on ^Ik^^ for sufficiently small ^. Thus we prove that 
PEk decays to zero, i.e., rate R is achievable. |f| □ 

'We may also employ a modified decoding after we obtain io,fc, by letting Wk be the sub-interval center closest to (J — 
($fe)^)~^io,fc- This removes the estimate bias. The asymptotic behavior analysis of the communication scheme remains the 
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B. Power computation 

Proposition 2. Assume the hypotheses of Theorem |7] Then the average channel input power is 

m. 

P := limsupPfc = 5^vr[j]7(s[i]), (76) 
and hence satisfies power constraint H49\) . 

Remark 2. The idea of the proof is as follows. We show that the average power of the ith subsystem 
converges to 7(s[z]), and the ith subsystem is selected to generate the channel input if and only if the 
previous fade was s[i]. Hence, the average power used by the communication scheme dTS] ) converges 
to the weighted sum of 7(s[i]), the optimal power. Note that the channel input depends on S^^i, 
and hence the channel input power at each time depends on the previous channel state. This does not 
contradict (l20l ). in which 'y{Sk~2) is used for generating Xk- This is because, by design, that the effect 
of 7(5^-2) would not be reflected in the channel input at time {k — 1), the immediate next step, but 
at time t > {k — 1) such that St^i = Sk-2- In other words, the channel input power at t depends on 
Sk-2, or effectively depends on St^i. 

Proof: See Appendix iDl □ 
Combining Propositions [T] and |2j we have completed the proof for Theorem [T] 



C. Further performance analysis: Doubly exponential decay for typical state sequences 

In this subsection, we show that the proposed feedback communication scheme leads to a doubly 
exponential decay of the probability of error when the channel state behavior is "typical", as defined 
below. 

Definition 3. Define the typical set of sequences {Sk} as 

n /rci ^0''^) r-i , '>T'{j,l,k) \ 

^TYP ■■= <^{Sk} -^-pY ^ ^[-^l n{j k) Pjl ^ ^ °° \ ■ (^^) 

Each sequence in ^typ i^ called a typical state sequence. 

By ergodicity of {S^}, it holds that ¥t{Q.'J'yp) = 1. Hereafter, by "with probability one" or "almost 
surely" or "a.s." we mean "for every channel state sequence {Sk\ G ^typ" or "for every typical state 
sequence". Then by the above definition, it holds that 

n(j, k) 



k + l 
n{j,l,k) 

n{j, k) 

n{j, I, k) 

k + l 

as k tends to infinity. 
We then have 



Pji (78) 



Proposition 3. Assume the hypotheses of Theorem |7] Then 
i) For sufficiently large k, for any j and {S^} € Qtyp< 

PE^l^^ < f3i exp [- exp((A; + l)/32 + o{k + 1))] (79) 

for some Pi, (^2 > 0, where o{k) denotes the terms with lower order than k, namely o{k)/k vanishes as 
k tends to infinity; 
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ii) For any j and {5^} G f^ryp, the decay exponent for PE^^^^ is 



e(^) := Jim ^ log [ log [ ) ) = 2elog(a[j]), (80) 



and the decay exponent for PE^g is 



e := lim —log (log f ) ) = mine^-'^ = 2eloga, (81) 

k-^ook V \PEk\sJJ J 

where a := mirij a[j]. 

(i) 

In other words, this proposition claims that, for each sub-codeword Xq , essentially its probability of 
error decays doubly exponentially, and hence PE^g decays doubly exponentially with respect to k for 
large enough k, provided that we exclude a set of {Sk} that would occur with zero probability and on 
which the signalling rate cannot be achieved 0. The decay exponent of PE^g is given by the smallest 
(slowest) decay exponent of PE^^ among all j. When the channel has no fading, i.e., m = 1 and 
a = a{s[j], s[l]) for all j and I, the decay exponent becomes e = 2eloga = 2(C — R), and we recover 
the decay exponent obtained in [10], [11]. 

Remark 3. To be more precise, this double-exponential decay argument builds on an infinite-length 
channel state realization sequence {5^}. Then there corresponds to each fixed coding length {k + 1) a 
communication problem in which the channel state sequence is ^g, a prefix of {Sk}- Each problem has 
its own message (one out of the possible messages) to be transmitted, and each problem generates 
its own AWGN A probability of error PE^ is then obtained whose upper bound vanishes doubly 
exponentially as k goes to infinity, if {Sk} is typical. Since typical state sequences form a probability 
one set, we conclude that with probability one, PEk\s decays doubly exponentially; in other words, 
almost every "sample trajectory" of PEk\s decays doubly exponentially. It is also worth noting that 
the average probability of error PEk (see (|46l )) decays only singly exponentially, though PEk\s decays 
doubly exponentially with probability one. In fact, this proposition concerning PEk\s is based on the 
Strong Law of Large Numbers whereas the decay of PEk is based on the Weak Law of Large Numbers. 
The Strong Law (and hence the doubly exponential statement for PEk\s over Q^typ) holds for the 
set of channel state sequences {Sk} of infinite lengths, whereas the Weak Law (and hence the singly 
exponential statement for PEk over O^) holds for {^k}^=Q, the sequence of sets consisting of 
channel state sequences Sq of finite lengths (though k can be very large, it is not infinity). 

Remark 4. Despite the fact that all sequences {Sk} G ^typ have the same decay exponent, the 
convergence to the decay exponent is not uniform in k and not uniform over Q^typ- This is because for 
any given sequence, how far (1/A;) log(log(l/Pi^;j|5)) is away from the decay exponent depends on the 
{k + l)-length prefix of the sequence, which varies for each k and from sequence to sequence. 

Remark 5. Similar to the AWGN case, the doubly exponential decay is possible if an average channel 
input power constraint is used (cf. [44]); a singly exponential decay is expected if a peak power constraint 
is used. 

Proof: See Appendix |El □ 

'"if the channel state sequence is not typical, then there exists at least one channel state s[j\ whose subsystem uses the 
forward channel and feedback channel less often than it typically does, resulting in possibly insufficient refinement of sub- 
codeword x"q \ i.e., x[pf. may not be close enough to xlp . Thus, the transmitter codeword may not be correctly decoded. In 
other words, the probability of error does not decay to zero and the signalling rate is not achievable. 

"The message and the noise in the problem for (fc + 1) are not necessarily nested in the problem for (k + 2), as the upper 
bound of the probability of error does not depend on which message is selected or on the realizations of the noise. 
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D. Transmission of Gaussian random vectors 

With some modifications we can transmit a Gaussian random vector over channel F, in parallel with 
the AWGN channel case. Let us use the same parameters given in ([T9l)-(l22l) and Xq ~ J\f{0,VIm), 
and follow the dynamics (fTSl ). For a given channel state sequence {Sk} G ^^typ, we obtain the MSB 
distortion as 

MSE(xo,fc) := E(xo - Xo,fc)(xo - Xo^kY = i<^kfBxk+ix'k+i, (82) 
which, by rate-distortion theory, requires an asymptotic rate to be at least 



k 



lim — — r- log — — — r- = lim — — log ■ 



.oo2(A; + l) detMSE(xo,fc) k^^ 2{k + 1) lYll^{det{Bxk+ix'^^^)4>'i^y 



(b) 



log a, 



where (a) follows from the boundedness of V"^ and det(Exfe+ix'^,_,_^), and (b) follows from a derivation 
similar to Appendix [A] Since log a is the capacity, we conclude that Xq is successively refined at the 
capacity rate of channel F. 



E. Special case: AWGN i.i.d. fading channel 

Theorem [T] directly applies to the case that {S^} forms a discrete i.i.d. process. However, a simplified 
capacity-achieving feedback scheme with a scalar transmitter state exists. Assume that the channel state 
has an i.i.d. distribution given by 

Pr(Sfc = s\i]) = p[i] for A: = 0, 1, • • • , (83) 

where for i = 1, 2, • • • , m, p[i] and s[i] are fixed numbers. Given any power budget > 0, we choose 
the parameters in the communication scheme as 

A{Sk_2,Sk_i) :=A{Sk^i) := ^(5^„i)2p + i eR 

6(5,„2,5,_i) :=6(5,_i) := ../'"^^^f ^^4) 

^/{Sk-iyv + 1 

c(5fc„i) :=c :=1 G M. 

Note that A and 6 in this design do not require the augmented channel state {Sk-2, Sk-i)', Sk-i is 
sufficient. As a consequence, no multiplexing is needed. The scalar dynamics of the associated control 
setup evolves according to 

Xk = A{Sk^i)~^Xk-i - 6(5fc_i)iVfe_i, (85) 

where Xk is a scalar. We can show that this design leads to a transmission rate arbitrarily close to the 
capacity (proof skipped for brevity) 

Qid = ^X]pWlog(l + sH2p) = iE5log(l + 52p). (86) 

i=l 

Note that no power adaptation is present in the capacity formula and in the proposed scheme. 
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1) Extension: AWGN i.i.d. fading channel with infinite channel states: AWGN i.i.d. fading channels 
with infinite channel states include many channels as the special cases, such as the Rayleigh, Rician, 
Nakagami, and Weibull fading channels. Here we focus on the scenario of real channel state-spaces; the 
scenario of complex channel state-spaces can be studied likewise. Assume that the channel state forms 
an i.i.d. process with density ps{s) defined on M and that the first and second moments exist. Then the 
channel capacity is 

C^id,inf = \^S^ps log(l + S'^V). (87) 

Then we construct a coding scheme with a scalar transmitter state as in the finite state-space case, using 
the choice of parameters given in (l84l ). As we show in Appendix |Gl this scheme achieves any rate 
below the feedback capacity given in dSTt . We point out that the proof makes use of the fact that the 
transmitter can be designed as a scalar system and hence this proof may not be directly applicable to 
Markov channels with infinite states. 

V. Numerical example 

Consider a Gilbert-Elliot fading channel with AWGN, i.e. an AFSMC with only two states, as 
illustrated in Fig. [H We simulate the proposed scheme for this channel. Fig. |9] (a) shows the simulated 
PE^^g and PEj^^g for a randomly chosen {Sk}, as well as the theoretic PE/^^g computed from (l60l) and 

(l45l) . where e > is a slack from the Shannon capacity C, i.e., the signalling rate is R = {l—e)C. We see 

(i) 

that the probability of error decays rather fast within 20 channel uses. However, the decay of PEj^^g and 
PEk\g is not quite smooth, caused by instantaneous deviations from the typical channel state behavior 
(namely, for some k, either |n(j, k)/{k+l)—TT[j]\, or |n(j, /, k)/n{j, k)—pji\, or both, are not sufficiently 
close to zero), though {Sk} may be typical; see also Remark 3. This may be improved by considering a 
"turbo mode" of using larger power at the moments with large instantaneous deviations from the typical 
state behavior, which does not affect the average power constraint (under further investigation); see [45] 
for the idea of the turbo mode. Fig. |9] (b) shows the decay of PEk 0- These fast decays imply that 
the proposed scheme allows shorter coding lengths and shorter coding delays; here the coding delay 
measures the time steps that one has to wait for the message to be decoded at the receiver with a small 
enough probability of decoding error. The short coding delay is also reflected in Fig. |9] (c), where we 
compare the message and the decoded message bit by bit and count how many bits are correctly obtained 
by the receiver. At time k = 24, the channel can transmit 35.8 bits if at each step the capacity C is 
attained, and the simulation shows that on average 34.9 bits are actually correctly decoded. 



P12 




P21 



Fig. 8. Gilbert-Elliot fading channel witli AWGN. 

Therefore, though the availability of output feedback at the transmitter does not affect the capacity 
when DTRCSI is available, we have seen that, output feedback can considerably simplify the coding 

'"Note again that PEk, the probability of error averaged over Qk, decays only singly exponentially. [45] presented a coding 
scheme which can reduce the decoding errors caused by atypical channel state sequences for "streaming" communication. 
Whether the same idea is applicable here to lead to a doubly exponential decay of the averaged error probability remains to 
be seen. 
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Fig. 9. (a) Simulated PS^j^, simulated PEf^\s> and theoretic PEk^g. (b) Theoretic PEk. (c) The number of bits that has 
been correctly decided and the number of bits that could be correctly decided if at each step the capacity rate is attained. 
s[l] — 2, s[2] = 1, pii — 0.65, P22 = 0.38, P = 3, and e = 0.2 (i.e. R = Q.8C), unless otherwise specified in the legend. 



design and coding process while achieving a rate arbitrarily close to the capacity (in contrast to some 
sophisticated designs to approach the capacity using Turbo codes or LDPC codes, see e.g. [46]), and it 
leads to a better performance in terms of probability of error than the coding schemes without feedback 
in the literature. Recall that turbo codes or LDPC codes typically need long coding lengths of at least 
several thousands to achieve a reasonable performance (in [46], coding length of 8000 was used over 
a Markov channel). However, since power adaptation has not been employed in those schemes in the 
existing literature, a fair and more accurate comparison with our scheme is not yet available. 

VI. Conclusions and Future Work 

In this paper, we proposed, based on a control-oriented approach, a capacity-achieving feedback 
communication scheme for an AFSMC with precise CSI available to the receiver immediately and to 
the transmitter with unit delay. In essence, this scheme consists of a set of subsystems designed for 
an AWGN channel with feedback, and the subsystems are multiplexed to share the AFSMC and the 
feedback link according to the augmented channel states. The scheme greatly simplifies the complexity 
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in the coding design and coding processes. The error probabihty decreases to zero doubly exponentially 
for typical channel state sequences and it shortens the coding length, compared with existing coding 
schemes over an AFSMC in the literature. 

In constructing the proposed scheme, we established the equivalence between feedback stabilization 
over an AFSMC and communication with access to noiseless feedback over the same channel. We have 
seen that the utilization of the control-theoretic equivalence of the proposed coding scheme facilitates the 
development. In fact, the ideas of introducing multiplexing and augmented channel states in the feedback 
communication problem originated from the study of the associated control problem; particularly the 
idea of multiplexing was motivated by the control of MILS, and the idea of augmenting channel states 
was motivated by considering control systems with delay. We remark that these two ingredients may 
play a significant role in studying more general feedback communication systems with known channel 
variation (varying with time) and delays. 

There are several open directions for future work. First, note that the assumption of perfect (CSI and 
output) feedback is a major limitation in existing feedback communication studies. The literature on 
imperfect CSI at the transmitter side is vast; see e.g. [3], [8], [9], [47]-[50]. Quantized CSI feedback is 
typically assumed, and usually waterfilling-type capacity formulas are obtained, in which case one may 
extend the multiplexing designs to achieve the capacities. If, on the other hand, the transmitter-side CSI 
is corrupted by sources other than quantization, it remains to be studied how the multiplexing used in 
this paper may be extended, as the receiver may not know how the sub-codewords are multiplexed at 
the transmitter side. However, the fundamental difficulty in noisy feedback lies in the imperfect output 
feedback, as shown in [10], [11]. In the noisy output feedback case, the SK coding scheme (as well as 
its direct extensions) leads to a non-vanishing probability of error for a fixed signalling rate, or leads 
to a vanishing signalling rate if the probability of error decays to zero, as the coding length tends to 
infinity. Nevertheless, it would be useful to study the more reahstic problem of communication with 
noisy feedback as the ideal problem of communication with noiseless feedback has been resolved. 

In addition, we wish to further explore the role of the cheap control (or its counterpart in estimation 
theory, the Kalman filter) in feedback communication. The Kalman filter, the optimal recursive MMSE 
estimator, can be easily transformed into an optimal feedback communication system achieving the 
Shannon capacity, for either an AWGN channel with output feedback or an AFSMC with output feedback 
and side information (cf. [27]). Moreover, our existing study has revealed further tight connections among 
communication, estimation, and control, which may be seen as an extension of the connections between 
communication and control discovered in e.g. [12] and have been employed to characterize the capacity- 
achieving feedback schemes for Gaussian channels with memory (cf. [16], [51]). 

Finally, we wish to extend the main ideas of this paper to MIMO time-varying fading channels 
with noiseless output feedback. For the case of single-user MIMO Gaussian channels with perfect CSI 
available to the receiver instantaneously and to the transmitter with delay, we expect that our proposed 
communication scheme can be extended relatively easily. When multiple users are present, however, more 
complicated strategies may be needed. The challenges may be seen from the fact that only bounds for the 
capacities of some multiple-user channels are available. Multiple-user MIMO problems are especially 
challenging. Not surprisingly, challenges present in both the communication problem and the associated 
control problems, especially for some multiple-user MIMO systems. From a control theoretic perspective, 
[12] obtained the feedback capacity or best-known feedback capacity regions for some multiple-user 
time-invariant MIMO Gaussian channels, including multiple-access channels, broadcast channels, and 
symmetric interference channels, and feedback communication schemes to achieve the capacity or 
capacity bounds were also given. However, open problems in feedback communication translate to 
open problems in feedback control. For example, the capacity problem of Gaussian interference channel 
with feedback becomes a structured control optimization problem ( [12]). At any rate, applying the 
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combination of the techniques developed in this paper for SISO time-varying channels and those 
developed in [12] for multiple-user MIMO time-invariant channels may serve as an initial step to tackle 
the MIMO time-varying feedback communication problems, and may provide useful insights, possibly 
on both feedback information theory and control theory. 

Appendix 

A. Proof of C = log d 

Assume without loss of generality that Sk = s[i] and Sk+i = s[j], both drawn from the stationary 
distribution of {Sk}- Then it holds that 



^ m m 



2 

i=i j=i 

m I m 



^ ^TT[i\pij\oga{s[i\,s[j\) 
i=i \j=i 



^ log a(s[i]) 

i=l 

log d 



B. Proof of converse 

This proof is motivated by the converse proofs in [4] and in [8]. 
For any (M, K + \) code, the Fano's inequality yields that 

KW\y^,S^) = h{W)-I{W;y'^,S^) 

= log M-I{W-y^,S^) (88) 

< hiPEx) + PEk log M 

and hence that 

1 ■ - ' 1 , K nK\ 



K + 1 '""^ ^' ^ (A-+1)(1-PE,,) ("<^^-' + ^f^^-^ rf- «o )) . 
where W is the uniformly distributed message. If the code leads to a vanishing probability of eiTor as 
K tends to infinity, namely PEk (and hence h^PEx)) is vanishing, then the signalling rate R satisfies 

R < llmsnp -^—I{W;y^,S^). (90) 

K^oo -K + i 

In addition, the code needs to satisfy the power constraint 

1 ^ 

limsupPft- = limsup— — -Ey^{ukf<V. (91) 

K^oo K^oo K + 1 ^ 



To see that the above equality is true, by the definition of Pk, we have 

K 

K+~ 



Pk = ^^Y.[ E [^<S^-'mu.\sf)] . (92) 
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Note that £(^^15)^ depends only on Sq ^, by a simple counting argument, it holds that 
Pk 



K 



(93) 



Thus dlB holds. 

Now note that depends on 5q~^ and ^q"^. We then denote En^, conditioned on 5q^^ and y^"^, 
as Then Ef=o E/fc|5r\yr^ = {K+1)Pk in light of m. Since 

/(Ty;yo^,5o^) 

I{W-y^\S^) 
h{y^\S^)-h{y^\S^,W) 

K 



(a) 



(b) 



(c) 



(d) 

< 



Y,(h{yu\S',-\y',-\S^)-h{y,\S',-\yl-\W,S,,S^^,)^ 

fc=0 
K 

fc=i 

< [Kyk\Sk-i, yo~\Sk) - HN^)) + I{W; yo\So) 

k=l 
K 

(^h{yk\Sk-i,yo~\Sk)-h{yk\Sk-i,yQ~\Sk,Uk)) +I{W;yo\So) 

k=l 
K 

Yj yk\Sk-uyo~^,Sk) + I{uo; yo\So) 
k=i 

(e) ^ ^ 1 

< -^Elog(l + (5,)2/fc|5r\,r) + 2^^°s(^ + (^°)'^°) 



k=l 
K 



iX;E{E [log (1 + 



... 1 fc-l 

li/o 



k=l 
K 



Sk-i,Sk 



} + iElog(l + (5o)2/o) 



< ^J^Elog 1 + (5fc)2E (/,|^.-. 



fc=l 

K 



Sk- 



+ -Elog(l + (5o)Vo) 



JX^Elog (1 + (5fc)2r(5fc_i)) + ^Elog (1 + (5o)Vo) 



k=l 

(94) 

where (a) follows from the independence between W and 5,^, (b) is due to that conditioning reduces 
entropy and the definitions of Uk and y^, (c) is again due to the definitions of and y^, (d) is because of 
the data processing inequality, (e) is because Gaussian input maximizes mutual information under power 
constraint and conditioned on channel state and past channel outputs, and (f) follows from Jensen's 
inequality, the independence between Sk and conditioned on Sk-i, and the independence between 
Sk and Sq'"^ conditioned on Sk~i- 
Therefore, we have that 

1 ^ 1 
I{W;y^,S^) < - J^Elog (1 + iSkfT{Sk^i)) + -Elog(l + (5o)Vo) (95) 



k=l 
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subject to power constraint '^k=o^^(^k-i) ^ {K + 1)Pk- Hence, by the stationarity and ergodicity 
of the channel state process, it holds that 

R < ^Elog (1 + (5fc)2r(5fe„i)) (96) 

where Sk~i follows the stationary distribution and Er(S'fc_i) < V. Finally we have R < C hy the 
optimality of 7(-). 

C. Proof of Lemma |2] 

i) The dynamics of x'^^'^ without external input is x^^\ = a{Sk-i, Sk)"^x^p if Sk-i = s\j], or 
x):^^ = xf! otherwise. So we have 



and hence ^k is the state transition matrix for (I23I ). To show the convergence in probability, we need 
to show that for any e > small enough, it holds that 

Pr(|4')| <e) ^1, (97) 



in other words. 



Pr 



m 



1=1 



< e ^ 1. (98) 



Since |a(s[j], > 1 for all j and I, by the Continuous Mapping Theorem, it is sufficient to show 
that for any M > large enough, there exists at least one I such that as k goes to infinity, 

Pr(n(i,/,fc) > M) ^ 1. (99) 

Notice that pji > for some /, by Q, ( |99l) is true because 

Pr {n{j, I, k)>{k + l){Tr[j]p,i - 6)) ^ 1 (100) 

for any 6 > sufficiently small (noting that 7r[j] > by ergodicity). 

ii) Conditioned on and Xq, we obtain that 

e4+i = 4'^4'^ (101) 

and hence, by i), the boundedness of Ex^ on for each k as well as the convergence in probability 
of Ex^j^^ to zero. 

iii) We first show the independence of x^^^ and when j ^ I. To show this, notice that, conditioned 
on xo and {5^}, both x^^ and x^^ are Gaussian; this is because that the system is linear and that the 
only randomness in x"^ and x^^ is from the AWGN {A^fe}. Then it is only needed to show they are 
uncorrelated, namely 

Y^xfxf = ^xf^xf . (102) 

If /c = 0, obviously (11021 ) holds. Suppose that (11021 ) holds for some k, then for (/c + 1) and for any i, 

^(i) ^{ a{Sk-\-,Sky^x{' -h{Sk-\,Sk)Nk if = s[i] ^^^^^ 

I Otherwise. 
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So 



E[a(5fc_i,5fe)-i4'^ - b{Sk-.uSk)Nk]x 



1^(0 



(0 



if Sk-i / s[j] and S'fc-i / s[/] 
if 5'fe-i = s[j] 

if = 



Ea(5fc_i, 
Exi;'^(5fc_i,5fc) 
Exg,Ex«, 
Ea(S'fc_i, S'fe)" 



-i^(i)^(0 



-1^(0 



Ex^-'^Ea(5fc_i,5fc) 
Exg,Ex«,. 



1^(0 



if Sfe.i / s[j] and 5'fc„i / s[Z] 
if Sk-i = s[j] 
if = 

if / and 5fc_i / s[l] 
if = 
if Sk-i = s[l] 



(104) 



Thus, ( 11021 ) holds for any G N by induction. 

By the independence of x^j^'^ and x^.''' when j ^ I, is a diagonal matrix. If S^-i 



s[j] then 



E(x 



k+l) 



< 



a{Sk-i, Skr^Eix'j^^f + b{Sk-i, Sk) 
a~'B{x^^^f + b^ 



where a := mmj^i a{s[j], s[l]) and b := maxj ^ s[Z])|; or if, however, 7^ s[j], then 

E(4' 



Since < a ^ < 1, for any k, 



B{x): 



(i)N2 



< 



< 



< 



2K(ij)\2 



~2K 







+ 



(105) 



0.5^ + 



where K > (dependent on k and {5^}). Since 'E{xf'Y > where 6 := min^y I^C'Sb'], [^])| > 0, 
E(iCfc '')^ is bounded from both above and below by some positive constants for all k; note that the 



constants can be chosen as independent on k, Xq, and {5^}. Notice that S^^^ := E(x 



„(j)-)2 



(Ex5f'^)2 is 



Strictly positive since the randomness in the noise enters the system. Then, because |Ex): | decreases to 

(7) 

zero monotonically as k increases, is uniformly bounded from above and from below by positive 
constants for any xq, {Sk}, and k. 



D. Proof of Proposition |2] 

To prove this proposition, first note W\\ holds. It is sufficient to show that 

m 

E(t/fc)'^^vr[ih(s[i]). (106) 
i=i 

Then by the Cesaro mean (i.e., if an converges to a, then the average of the first n terms converges to 
a as n goes to infinity), the limit in the right-hand side of (|9l1 ) exists and the result follows. 
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Now let us study the recursion for E(x^''^)^. Assume Sk-2 = s[j] and S^-i = s[l]. Then ( |26b yields 
that 

Eix'i^f = a{s[j],s[l])-'E{x'il,f + b{s[j],s[l])' 

a(s[j],.[/])-2E(xif2i)' + a(s[j],.[/])-^(.[i])2.[/]2 
= a{s[j],s[l]r\E{x^f^^f+^{s[j])h[lf), 

where (a) follows from (|2TI) . Subtracting both sides by 7(s[j]), we obtain that 

E(4'V-7(^[j1) = a{s[j],s[l]r' {nx'j^lif + - a{s[j],s[l])Hs[j]^ 

= a{s[j],s[l]r'(Eixl^l,f-^{s[j])), 

and thus conditioned on a channel state sequence Sq, 

E(4'^)' - = (Erf^)2 - 7{s[j])) . (107) 

It follows from Lemma|2]i) that (E{xl^^f - 7(s[i])) converges in probability to zero, namely E{xY ) 
converges to 7(5 [j]) in probability. Note that we have omitted the conditioning on Sq of x^/^^ to simplify 
notation; the obtained E(x^"'^)^ is a random variable depending on the random variable Sq. 

Since the channel input u is the multiplexing of xj:' , the power of u is the power of xj: averaged 
over j, therefore we have that 

m 

E(^xfc)2-^^vr[j]7(s[i]), (108) 
i=i 

which implies (11061 ) as E(ufc)^ is deterministic. Thus we complete this proof. 

E. Proof of Proposition \3\ 
We first strengthen Lemma |2] 

Lemma 3. Assume the hypotheses of Theorem\I} and fix a channel state sequence {S^} in Vt. Then for 
the control setup f |2jD , 

i) It holds that for each j, 

lim ^ if {Sk} G ^TYP] (109) 

fc— >oo 

ii) For any fixed initial condition Xq, it holds that 

lim Exfc ^ if {S"/,} e ^TYP- (110) 

fc— ►oo 

The proof of the lemma is as follows. By (1771 ). on O^yp, n{j, k) goes to infinity as k goes to infinity, 

ii' 

k 



since 7r[j] > 0. Because < a{Sk-i, Sk) ^ < 1, '^^"'^ goes to zero on Q.typ- Moreover, conditioned on 
Sq and Xq, we obtain that 



Exgi = 4^'^x(^') (111) 

and hence Ex^ converges to zero on f^ryp. □ 
Now we are ready to prove the proposition. Fix any {S^} £ ^typ- Recall that 

where Qk,i satisfies 



PE^^^ = Qk,i + Qk,2, (112) 



Q,,<Q\ + I , (113) 
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and <5fc 2 satisfies a similar inequality, for sufficiently large k. This is because for any /i > 0, there exists 
k large enough such that the prefix 5q of {S^} is in 0^,^, then the result in the proof of Proposition [T] 
can be used here. Let 



(114) 

Then 

Qm = Q («"^'''^+'^) , (115) 

Since n{j,k)/{k + 1) converges to 7r[j] when /c goes to infinity for any {Sk} € ^typ, we have that 
^fc/(fc + 1) vanishes when k goes to infinity (noting that 5k is not viewed as a random variable here 
since {Sk} is viewed as realization). That is, 

Qk,i<Q(a''^^'^^+''^^+^A . (116) 



The Chernoff bound of the Q-function (cf. [52]) says that 

Q{t) < ^ exp(-ii2) = exp (^-1 {t^ + log(2vri2)) 

Then 

= gj^p|_l(Q,2-)n(i,fc)+ilog„[a°('=+i)+a-^"<^''")log(27r«2"«.'=)+°^ 



exp ( -l(a2)-yKfc+i)+o(fc+i) 



Similarly, 

Qk,2 := Q I ^ - 4= 1 < exp f _i(a2).b1(fc+i)+o(fc+i) 



P^il < 2exp ( -i(a2-b1)(fc+i)+°(fc+i) ) . (117) 



and hence, 

(i) 

To get the (asymptotic) decay exponent of PE^^^, noticing that for k large enough, the Chernoff 
bound becomes tight, we can derive (following the steps similar to above) that 

lim -^log (log ( -3— ) ) = lim -2 V(i(j,/,A;)log(a(s[j],s 
k^ock + 1 V \Qi,kJ J ^ 

m 

= 2e^7r[j>j7log(a(s[j],s 
1=1 

= 2eloga[j]. 



It can be also shown that 



7VT^°S ( (77— ) ) = 2eloga[i]. 
fc^oo fe+ 1 V \Q2,k/ 



It is easily seen that 



lim J (log ( 7-^^ — T- ) ) = min(log a, log 6) (118) 

k^oo k \ \exp(— 0*^] + exp(— o'^j / 
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for a,b > 0. (That is, the "average" decay exponent of (exp(— a'^) + exp(— ft'^)) is the smaller decay 
exponent for exp(— a*^) and exp(— 6^).) Then dSOl ) follows from dllSI ). Finally, notice that asymptotically 
^-^fe|s = EJLi ^-^Ijs' then dSB follows from (fTTSl ). 

Modifications if Al or A2 does not hold 

1 ) When Al is not assumed: In this case, there exists some states that are assigned with zero power. 
Suppose s[J] is the only such state; for cases with more than one such states, the same idea applies. 
We now have a(s[i],s[J]) = 1 and a[J] = 1. Then the Jth subsystem needs some modifications, as 
described below. For the encoding, let x^q^ = and make it known to the receiver. That is, the Jth 
subsystem is not used to transmit any message. It leads to zero signalling rate, zero transmission power, 
and zero probability of error associated with this subsystem. We then see that in this situation, our 
scheme behaves exactly as the capacity solution equation suggests. Namely, at the moment when no 
power is assigned, no message needs to be transmitted, no probability of error is incurred, and there 
is no contribution to the rate. Therefore, it can be easily verified that any rate below the capacity is 
achievable by simply noting that: 1) In Lemma |2j i) and ii) hold for any j ^ J, and iii) holds if 
we modify the definitions of a and 6 as a := min^yj; a(s[j], and b := maxj^j; s[/])|; 2) 
C = E™ilog«(^W) = E.^jloga(s[i]); 3) En2 = EIli vrHE(x('))2 = vr[i]E(x«)2; and 4) 
Proposition [T] holds if we assign Pe'^^^^ = 0. 

2) When A2 is not assumed: In this case there is some i with s[i\ = 0. Then a(s[j],s[i]) = 1, but 
a[j] > 1 still holds for each j with 7(s[j]) / 0. To see this, assume otherwise for some J, a[J] = 1 but 
7(s[J]) > 0. By (l29ll and (|20ll, this would yield that, for each I = 1, - ■ ■ , m, it must hold either i) s[l] = 
or ii) ■k[J]pji = 0. However, ii) is equivalent to pji = since 7r[J] > 0. Hence, for any / such that 

/ 0, Pji has to be 0; i.e., the probability of s[J] jumping to any nonzero state is zero. In other words, 
s[J] must jump to s[J] with probability one, which according to [4] would imply that 7(s[J]) = 0, a 
contradiction. So a[j] > 1 holds as long as 7(5 [j]) 7^ 0, and a(s[j],s[/]) > 1 and T^\j]pji > for some 
/. Thus, (j)^^^ converges to zero following the proof used in Lemma |2]i). Again iii) holds if we modify 
the definitions of a and 6 as a := minj^j^i a{s[j], s[l]) and b := max^yj^/ •s[/])|. For the proof of 

Proposition [U modify the definition of a as a := min^^j ^(•sblj s[l]) > 1. Then Lemma|2]as well as the 
main results hold. 



G. Proof for the case of AWGN i.i.d. fading with infinite state 

Pick any e > small enough. Uniformly partition the unit interval [— ^, ^] into [M^J sub-intervals, 
where 

Mfc :=exp((A; + l)(l-e)ElogA). (119) 



Then the asymptotic signalling rate is 



R = lim 



log Mfc log 



k^oD \ k + 1 k + 1 

(A;+l)(l-e)ElogA 
iim ; 

fc->oo k + 1 

(1 - e)Elog^ 

(1 ~ f)Ciid.inf 1 



where ■= Mk/[Mk\. 



(120) 
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In the i.i.d. case with infinite state, a channel state will be re-visited with zero probability, and hence 
the idea used for AFSMC cannot be applied; notations such as n(j, /, k) and cj)^^^ that were used to show 
stability and vanishing probability of error for AFSMC make no sense here. The infinite-state i.i.d. case, 
however, requires only a simpler notion of "typicality" which is still based on the Weak Law of Large 
Numbers to establish the achievable rate result. Define the {k, ^)-typical set of sequences Sq as 



k,fj, •" 



^5^1ogA(50-ElogA </.| 



(121) 



As St forms an i.i.d. process, so does log A{St) and thus, by the Weak Law of Large Numbers, it holds 
that Fv{Qk,,^) ^ 1. 

Following the idea in the finite state-space case, to show the reliable communication with rate R, it 
suffices to show that the associated control system is stabilized in the sense of bounded and vanishing 
first moment Exfc and bounded second moment E(xfc — Ex^)^, for any typical channel state sequence. 
These stability notions of the control system imply that the receiver estimate, x^ f^, is "close" to the 
transmitted codeword, xq; recall that both xo,fc and xq are scalars. Then we need to show that this 
"closeness" can be translated into vanishing probability of decoding error and thus the achievability of 
the signalling rate R. 

Now we show that the stability of the control system can be easily proven; for brevity we only show 
that Exfc is bounded and vanishing here. Note that 



Xk = A{Sk-i) Xk-i - 6(S'fc_i)iVfe_i. 
The first moment evolves according to 

Bxk = ^(5fc_i)-iExfc_i 
when conditioned on any typical channel state sequence and initial condition, and hence 



Bxk = llAiS-, 



t-ij 



Xq. 



(122) 



(123) 



(124) 



t=0 



Then the convergence of the first moment follows from (11211 ) and that E log A > 0. To show that the 
stability translates to vanishing probability of error, we follow the steps in the Markov case and note 
that we need to show 



exp 



+ 1) 



k \ 

e)Elog^--l-^log^(50) 
t=o J 



(125) 



is vanishing, which is true for sufficiently small > in view of (11211) and the e slack used. Finally, 
the power computation can be done as before. Thus it follows that the proposed scheme is optimal. 
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